Skip to content

Conversation

@Patrick-Beuks
Copy link
Contributor

Add raw mode when parsing diffs to always get the index that can be used to get blobs.
Add deprecation notice when getting diff without raw information, but will continue to work as it did before.
Fixes #227

Would like feedback on the deprecation notice and would like to test against the foobar repository instead of passing in diff manually.

@Patrick-Beuks Patrick-Beuks marked this pull request as ready for review January 29, 2025 12:23
Copy link
Member

@lyrixx lyrixx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@lyrixx lyrixx merged commit 64d8e79 into gitonomy:main Jan 29, 2025
6 checks passed
@Patrick-Beuks Patrick-Beuks deleted the fix-empty-index branch January 29, 2025 16:11
}
$this->consumeNewLine();
} elseif (!$this->isFinished()) {
trigger_error('Using Diff::parse without raw information is deprecated. See https://github.com/gitonomy/gitlib/issues/227.', E_USER_DEPRECATED);
Copy link

@veewee veewee Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello there @Patrick-Beuks @lyrixx,

I'm just wondering:

Would this actually have to result in a deprecation warning?
For other tools using the DiffParser (like grumphp), we don't really care much about the raw index data.

See phpro/grumphp#1199

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are edge cases for parsing the diff that would result in an incorrect parse, not finding changed files, see issue #227.
I will check the use case as you described in your ticket to see if you would have hit the same edge cases tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oke, I have checked out your code.

You have a public interface on the diff in src/Git/GitRepository.php that returns the Gitonomy diff back.
Without the --raw this contains incorrect data and to protect consumers of this function from false information, I still think this deprecation is needed.

For example this this code for the spec/Git/GitRepositorySpec.php

    public function it_can_not_get_index_without_raw_without_changes(Repository $repository): void
    {
        $rawDiff = 'diff --git a/testfile b/testfile
old mode 100644
new mode 100755
';

        $diff = @$this->createRawDiff($rawDiff);
        $diff->shouldBeAnInstanceOf(Diff::class);
        /** @var File $file */
        $file = $diff->getFiles()[0];
        $file->getNewIndex()->shouldBe('');
        $file->getOldIndex()->shouldBe('');
    }

    public function it_can_get_index_with_raw(Repository $repository): void
    {
        $rawDiff = ':100644 100755 e69de29 e69de29 M        testfile

diff --git a/testfile b/testfile
old mode 100644
new mode 100755
';
        $diff = $this->createRawDiff($rawDiff);
        $diff->shouldBeAnInstanceOf(Diff::class);
        /** @var File $file */
        $file = $diff->getFiles()[0];
        $file->getNewIndex()->shouldBe('e69de29');
        $file->getOldIndex()->shouldBe('e69de29');

    }

    public function it_can_get_index_with_changes(Repository $repository): void
    {
        $rawDiff = 'diff --git a/testfile b/testfile
index e69de29..9daeafb 100755
--- a/testfile
+++ b/testfile
@@ -0,0 +1 @@
+test
';
        $diff = $this->createRawDiff($rawDiff);
        $diff->shouldBeAnInstanceOf(Diff::class);
        /** @var File $file */
        $file = $diff->getFiles()[0];
        $file->getNewIndex()->shouldBe('e69de29');
        $file->getOldIndex()->shouldBe('9daeafb');

    }

As you can see here without raw or changes the object is in an undesirable state.

(BTW thank you for letting me take another look at this, as I have found a bug with multiple files as fileIndex never increases)

Copy link

@veewee veewee Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into it.
Sorry for my questions here, I'm just trying to understand :)

I'm trying to understand what is problematic about this, but can't really seem to find out why this should have negative impact on our application:
We don't use the new / old index anywhere. We only try to find the files from the git diff.

The way I see it:

  • A diff can have different formats. When no --raw flag is passed, it will be missing some information indeed.
  • A diff parser is responsible for parsing diffs in most common formats, but if some information is not available it just doesn't provide this information to the result of the parse.
  • It's up to the piece of code consuming the diff to figure out it has all the information it needs.

An implementation could for example check if the indexes are set from the diff to check it is a file mode change or a difference in file content as well, without parsing the raw headers.

GrumPHP uses a diff as STDIN for the run command, so we don't always have control about the exact command the diff is created with. (we do have some presets of git hooks in which we have control, but users have the ability to overwrite those.

Copy link
Contributor Author

@Patrick-Beuks Patrick-Beuks Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without raw Diff contains most of the times the index information. This library even expects index information always to be available on the Diff object.

If you do not use the index information I would suggest to suppress the deprecation warning and wrap the Diff object in your own object and remove the index information to make is clear to users of your library that it is not accessible.

On the otherhand, if users can write their own diff, then it might be an idea to let the deprecation stand so that they can add the raw flag

Sidenote: this function does more then just get changed files, if you just need that there might even be faster option as this does parse the entire diff

(BTW I am just defending my justification for the deprecation as an end user of the library, in the end it is up to @lyrixx if this is a valid use case)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

ProcessException on change file if only mode changes

3 participants